Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Koblmüller, Stephan (Ed.)Within social hierarchies, rank can be dynamic and modulated by changes in molecular and/or physiological substrates. Here, we sought to better understand how social environment and rank shape male spawning behaviors and outcomes in African cichlid fish Astatotilapia burtoni. First, using a social dyad paradigm, we generated territorial (T)/Non-territorial (NT) male pairs. After establishing a stable social hierarchy, the behaviors of the Ts and NTs were recorded and scored. Afterward, pairs were separated and individually moved into a spawning phase, which consisted of a new tank with novel females and no other males where their behaviors were scored. While previous studies have shown how territorial and non-territorial males have unique behavioral profiles, we sought to deepen this interpretation with a focus on the latency of decision making, and on transition matrices representing enriched sequences of behavior. We found that while the courtship behaviors are shared between stably territorial and ascending males in the spawning phase, only the animals that were territorial in the dyad phase were the ones that were reproductively successful in the subsequent 16 h spawning phase.more » « lessFree, publicly-accessible full text available November 28, 2025
-
Leitner, Stephan (Ed.)ObjectivePeer review frequently follows a process where reviewers first provide initial reviews, authors respond to these reviews, then reviewers update their reviews based on the authors’ response. There is mixed evidence regarding whether this process is useful, including frequent anecdotal complaints that reviewers insufficiently update their scores. In this study, we aim to investigate whether reviewersanchorto their original scores when updating their reviews, which serves as a potential explanation for the lack of updates in reviewer scores. DesignWe design a novel randomized controlled trial to test if reviewers exhibit anchoring. In the experimental condition, participants initially see a flawed version of a paper that is corrected after they submit their initial review, while in the control condition, participants only see the correct version. We take various measures to ensure that in the absence of anchoring, reviewers in the experimental group should revise their scores to be identically distributed to the scores from the control group. Furthermore, we construct the reviewed paper to maximize the difference between the flawed and corrected versions, and employ deception to hide the true experiment purpose. ResultsOur randomized controlled trial consists of 108 researchers as participants. First, we find that our intervention was successful at creating a difference in perceived paper quality between the flawed and corrected versions: Using a permutation test with the Mann-WhitneyUstatistic, we find that the experimental group’s initial scores are lower than the control group’s scores in both the Evaluation category (Vargha-DelaneyA= 0.64,p= 0.0096) and Overall score (A= 0.59,p= 0.058). Next, we test for anchoring by comparing the experimental group’s revised scores with the control group’s scores. We find no significant evidence of anchoring in either the Overall (A= 0.50,p= 0.61) or Evaluation category (A= 0.49,p= 0.61). The Mann-WhitneyUrepresents the number of individual pairwise comparisons across groups in which the value from the specified group is stochastically greater, while the Vargha-DelaneyAis the normalized version in [0, 1].more » « lessFree, publicly-accessible full text available November 18, 2025
-
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)Spatial evolutionary games are used to model large systems of interacting agents. In earlier work, a method was developed using Bayesian Networks to approximate the population dynamics in these games. One advantage of that approach is that one can smoothly adjust the size of the network to get more accurate approximations. However, scaling the method up can be intractable if the number of strategies in the evolutionary game increases. In this paper, we propose a new method for computing more accurate approximations by using surrogate Bayesian Networks. Instead of doing inference on larger networks directly, we do it on a much smaller surrogate network extended with parameters that exploit the symmetry inherent to the domain. We learn the parameters on the surrogate network using KL-divergence as the loss function. We illustrate the value of this method empirically through a comparison on several evolutionary games.more » « lessFree, publicly-accessible full text available May 2, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Free, publicly-accessible full text available May 15, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Many real-world situations allow for the acquisition of additional relevant information when making decisions with limited or uncertain data. However, traditional RL approaches either require all features to be acquired beforehand (e.g. in a MDP) or regard part of them as missing data that cannot be acquired (e.g. in a POMDP). In this work, we consider RL models that may actively acquire features from the environment to improve the decision quality and certainty, while automatically balancing the cost of feature acquisition process and the reward of task decision process. We propose the Active-Acquisition POMDP and identify two types of the acquisition process for different application domains. In order to assist the agent in the actively-acquired partially-observed environment and alleviate the exploration-exploitation dilemma, we develop a model-based approach, where a deep generative model is utilized to capture the dependencies of the features and impute the unobserved features. The imputations essentially represent the beliefs of the agent. Equipped with the dynamics model, we develop hierarchical RL algorithms to resolve both types of the AA-POMDPs. Empirical results demonstrate that our approach achieves considerably better performance than existing POMDP-RL solutionsmore » « lessFree, publicly-accessible full text available May 5, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Free, publicly-accessible full text available May 3, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Free, publicly-accessible full text available May 3, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with \emph{only} experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the \emph{curse of horizon}. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a \emph{linear representation} of value function and stationary distribution correction ratio, \emph{i.e.}, primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex non-concave optimization in vanilla DICE, therefore enabling an computational efficient algorithm, but also paves the way for more efficient utilization of historical data. We highlight that our algorithm, \textbf{SpectralDICE}, is the first to leverage the linear representation of primal-dual variables that is both computation and sample efficient, the performance of which is supported by a rigorous theoretical sample complexity guarantee and a thorough empirical evaluation on various benchmarks.more » « lessFree, publicly-accessible full text available May 3, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)We study the problem of causal effect estimation in the presence of unobserved confounders, focusing on two settings: instrumental variable (IV) regression with additional observed confounders, and proxy causal learning. Our approach uses a singular value decomposition of a conditional expectation operator combined with a saddle-point optimization method. In the IV regression setting, this can be viewed as a neural network generalization of the seminal approach due to Darolles et al. (2011). Saddle-point formulations have recently gained attention because they mitigate the double-sampling bias and are compatible with modern function approximation methods. We provide experimental validation across various settings and show that our approach outperforms existing methods on common benchmarks.more » « lessFree, publicly-accessible full text available May 3, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Free, publicly-accessible full text available May 3, 2026
An official website of the United States government
